查看原文
其他

use clusterProfiler as an universal enrichment analysis tool

Y叔叔 biobabble 2020-02-05

总有人觉得只是做GO/KEGG的,其实那只不过是内置「快捷方式」而已,clusterProfiler是个通用工具,允许使用用户自己数据进化富集分析,那么就是你想做COG,domain的富集分析都不在话下。你要做KEGG,不一定要用enrichKEGGgseKEGG,你要做GO,也不一定要通过enrichGOgseGO,所以非模式、没有OrgDb,KEGG数据库没收录,这些都不是问题,你要想的是怎么搞个注释出来,各种数据库、(帮你做注释的)软件,搞到注释之后,今天要介绍的,就是你想要的,抛开那些条条框框。

clusterProfiler supports enrichment analysis of both hypergeometric test and gene set enrichment analysis. It internally supports Gene Ontology analysis of about 20 species, Kyoto Encyclopedia of Genes and Genomes (KEGG) with all species that have annotation available in KEGG database, DAVID annotation (only hypergeometric test supported), Disease Ontology and Network of Cancer Genes (via DOSE for human) and Reactome Pathway (via ReactomePA for several species). This is still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (eg GO via blastgo and KEGG via KAAS), unsupported ontology/pathway or customized annotation.

clusterProfiler provides enricher function for hypergeometric test and GSEA function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters TERM2GENE and
TERM2NAME. As indicated in the parameter names, TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional.

Here I used DisGeNET which contains 381056 associations, between 16666 genes and 13172 diseases, as an example to demonstrate the usage of enricher and GSEA functions.

> require(DOSE)
> require(clusterProfiler)
> data(geneList)
> deg <- names(geneList)[abs(geneList)>2]
## downloaded from http://www.disgenet.org/ds/DisGeNET/results/all_gene_disease_associations.tar.gz
> gda <- read.delim("all_gene_disease_associations.txt")
> dim(gda)
[1] 415681      9
> head(gda)
 geneId geneSymbol                                    geneName     diseaseId
1    540      ATP7B ATPase, Cu++ transporting, beta polypeptide umls:C0019202
2    540      ATP7B ATPase, Cu++ transporting, beta polypeptide umls:C0019202
3    540      ATP7B ATPase, Cu++ transporting, beta polypeptide umls:C0019202
4   4160       MC4R                     melanocortin 4 receptor umls:C0028754
5   4160       MC4R                     melanocortin 4 receptor umls:C0028754
6   3667       IRS1                insulin receptor substrate 1 umls:C0011860
                   diseaseName     score NumberOfPubmeds   associationType
1 Hepatolenticular Degeneration 0.9898764             229  GeneticVariation
2 Hepatolenticular Degeneration 0.9898764             229         Biomarker
3 Hepatolenticular Degeneration 0.9898764             229 AlteredExpression
4                       Obesity 0.9400000             234  GeneticVariation
5                       Obesity 0.9400000             234         Biomarker
6     Diabetes Mellitus, Type 2 0.9077659             106  GeneticVariation
                                           source
1 UNIPROT, CTD_human, MGD, RGD, GAD, LHGDN, BeFree
2 UNIPROT, CTD_human, MGD, RGD, GAD, LHGDN, BeFree
3 UNIPROT, CTD_human, MGD, RGD, GAD, LHGDN, BeFree
4        UNIPROT, CTD_human, MGD, RGD, GAD, BeFree
5        UNIPROT, CTD_human, MGD, RGD, GAD, BeFree
6        UNIPROT, CTD_human, MGD, RGD, GAD, BeFree
>
> disease2gene=gda[, c("diseaseId", "geneId")]
> disease2name=gda[, c("diseaseId", "diseaseName")]
> x = enricher(deg, TERM2GENE=disease2gene, TERM2NAME=disease2name)
> head(summary(x))
                        ID                  Description GeneRatio    BgRatio
umls:C0006142 umls:C0006142    Malignant neoplasm breast    89/196 3324/16666
umls:C0006826 umls:C0006826           NEOPLASM MALIGNANT   113/196 5102/16666
umls:C0678222 umls:C0678222             Breast Carcinoma    83/196 3072/16666
umls:C1458156 umls:C1458156 Recurrent Malignant Neoplasm    47/196 1158/16666
umls:C1458155 umls:C1458155             Breast Neoplasms    60/196 1981/16666
umls:C0600139 umls:C0600139           prostate carcinoma    56/196 2033/16666
                   pvalue     p.adjust       qvalue
umls:C0006142 4.934860e-16 9.845045e-13 8.249008e-13
umls:C0006826 3.071791e-15 3.064112e-12 2.567371e-12
umls:C0678222 5.894907e-15 3.920113e-12 3.284601e-12
umls:C1458156 3.230248e-14 1.611086e-11 1.349904e-11
umls:C1458155 1.724317e-12 6.880024e-10 5.764664e-10
umls:C0600139 5.025005e-10 1.670814e-07 1.399949e-07
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            geneID
umls:C0006142                                                                                                                                    4312/10874/991/6280/2305/9493/1062/4605/9833/9133/6279/10403/8685/597/7153/6278/259266/3627/27074/6241/7368/11065/55355/9582/55872/51203/3669/22974/10460/10563/4751/820/27338/890/983/4085/6362/9837/5918/332/3832/6286/5163/2146/3002/50852/7272/2568/9212/51659/1111/9055/4321/3620/6790/891/4174/9232/9370/1602/4629/771/3117/80129/23158/125/1311/5174/4250/2167/652/4036/4137/8839/2066/4693/3158/3169/1408/9547/2922/10647/5304/8614/2625/7802/9/5241/10551
umls:C0006826 4312/10874/55388/991/6280/2305/1062/3868/4605/9833/6279/10403/597/7153/6278/79733/3627/27074/6241/55165/9787/7368/11065/9582/55872/51203/3669/83461/22974/10460/10563/4751/6373/8140/1844/4283/27299/27338/890/9415/983/10232/4085/6362/5080/5918/81620/332/3832/6286/5163/2146/3002/50852/7272/2568/2842/9212/8715/1111/9055/3833/4321/11182/10112/3902/3620/3887/51514/6790/4521/891/8544/24137/10578/9232/9370/1602/3708/9122/10699/4629/771/3117/23158/125/4958/1311/2018/1308/4250/652/80736/4036/8839/2066/4693/3169/1408/9547/2922/1524/1580/10647/5304/8614/2625/7802/11122/9/5241/10551/4969
umls:C0678222                                                                                                                                                                    4312/10874/6280/2305/4605/9833/9133/6279/10403/8685/597/7153/6278/259266/3627/27074/6241/55165/11065/55355/9582/55872/51203/3669/10563/4751/820/27338/890/983/4085/6362/9837/5918/332/6286/5163/2146/3002/50852/7272/2568/9212/1111/9055/4321/3620/6790/891/9232/9370/1602/3708/4629/771/3117/23158/125/1311/5174/2532/4250/2167/652/4036/4137/8839/2066/4693/3158/3169/9547/2922/1524/10647/5304/8614/2625/7021/7802/9/5241/10551
umls:C1458156                                                                                                                                                                                                                                                                                                                                                           4312/991/6280/6279/8685/7153/6278/259266/3627/27074/6241/55165/9787/3669/22974/983/10232/4085/5080/332/2146/3002/50852/2568/9212/4321/3620/3887/6790/891/4174/9232/3708/4629/771/3117/23158/730/2018/4036/2066/9547/2625/9/5241/10551/57758
umls:C1458155                                                                                                                                                                                                                                                                                        4312/10874/2305/4605/9833/9133/10403/597/7153/6278/259266/3627/11065/9582/3669/10563/8140/820/1844/27338/890/9415/983/81620/332/2146/3002/2568/9212/51659/1111/9055/11182/3620/51514/6790/4521/891/8544/9232/9370/1602/771/23158/125/4250/2167/652/8839/2066/3169/10647/5304/8614/2625/7021/9/5241/10551/57758
umls:C0600139                                                                                                                                                                                                                                                                                                           4312/6280/2305/4605/9833/6279/7153/3627/6241/55165/11065/22974/10563/8140/1844/890/983/5080/5918/6286/5163/2146/9212/1111/4321/3620/6790/891/8544/9232/9370/8857/1602/3708/23090/4629/3117/23158/2532/2167/652/80736/4036/3169/9547/2922/11283/1524/5304/8614/2625/79689/11122/9/5241/10551
             Count
umls:C0006142    89
umls:C0006826   113
umls:C0678222    83
umls:C1458156    47
umls:C1458155    60
umls:C0600139    56
>
> barplot(x)

> y = GSEA(geneList, TERM2GENE=disease2gene, TERM2NAME=disease2name)
> head(summary(y))
                        ID           Description setSize enrichmentScore
umls:C0001418 umls:C0001418        Adenocarcinoma    1224       0.2431172
umls:C0001973 umls:C0001973            Alcoholism     548      -0.3213479
umls:C0002171 umls:C0002171       Alopecia Areata      74       0.5122605
umls:C0003864 umls:C0003864             Arthritis     447       0.3140411
umls:C0003872 umls:C0003872  Arthritis, Psoriatic     122       0.4311280
umls:C0003873 umls:C0003873 Arthritis, Rheumatoid    1186       0.2855594
             pvalue p.adjust qvalues
umls:C0001418      0        0       0
umls:C0001973      0        0       0
umls:C0002171      0        0       0
umls:C0003864      0        0       0
umls:C0003872      0        0       0
umls:C0003873      0        0       0
> gseaplot(y, "umls:C0003872")

Citation

Yu G, Wang L, Han Y and He Q*. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存